14 research outputs found

    hdData360r: A high-dimensional panel data compiler for governance, trade, and competitiveness indicators of World Bank Group platforms

    Get PDF
    The World Bank Group’s GovData360 and TCdata360 platforms are widely employed in socio-economic research. The thousands of governance, trade, and competitiveness indicators they contain served as the basis for much research in the field of economic, developmental, and cultural studies, coronavirus disease 2019-related research, and tourism, to state a few. The presented R package called hdData360r collects thousands of up-to-date annual indicators from these platforms for all countries worldwide. Furthermore, it allows missing value imputation with data from previous years, and optionally, it exports the generated dataset into tab-separated value (TSV) files. The hdData360r R package with a sample dataset it generates is available publicly on GitHub and Mendeley Data

    Projektek nyomon követése mátrixokkal = Comprehensive planning and coordinating matrix

    Get PDF
    The matrix-based planning methods – because of the iterational connection handled by them – became essential methods for product or software development projects. These planning methods do not only make it possible to define the activities and create connections between them, but also allows us to schedule (Eppinger et al., 1994) and track (Minogue, 2011) simpler, linear projects. The goal of the research – using mainly Minogue’s (2011) results – was to create a matrix-based method, which makes the planning of multiple projects transparent and trackable – even multiprojects –, and records and processes the data about it’s realization

    Projektek nyomon követése mátrixokkal = Comprehensive planning and coordinating matrix

    Get PDF
    The matrix-based planning methods – because of the iterational connection handled by them – became essential methods for product or software development projects. These planning methods do not only make it possible to define the activities and create connections between them, but also allows us to schedule (Eppinger et al., 1994) and track (Minogue, 2011) simpler, linear projects. The goal of the research – using mainly Minogue’s (2011) results – was to create a matrix-based method, which makes the planning of multiple projects transparent and trackable – even multiprojects –, and records and processes the data about it’s realization

    Generalized network-based dimensionality analysis

    Get PDF
    Network analysis opens new horizons for data analysis methods, as the results of ever-developing network science can be integrated into classical data analysis techniques. This paper presents the generalized version of network-based dimensionality reduction and analysis (NDA). The main contributions of this paper are as follows: (1) The proposed generalized dimensionality reduction and analysis (GNDA) method already handles low-dimensional high-sample-size (LDHSS) and high-dimensional and low-sample-size (HDLSS) at the same time. In addition, compared with existing methods, we show that only the proposed GNDA method adequately estimates the number of latent variables (LVs). (2) The proposed GNDA already considers any symmetric and nonsymmetric similarity functions between indicators (i.e., variables or observations) to specify LVs. (3) The proposed prefiltering and resolution parameters provide the hierarchical version of GNDA to check the robustness of LVs. The proposed GNDA method is compared with traditional dimensionality reduction methods on various simulated and real-world datasets

    The Role of Societal Aspects in the Formation of Official COVID-19 Reports: A Data-Driven Analysis

    Get PDF
    This paper investigates the role of socioeconomic considerations in the formation of official COVID-19 reports. To this end, we employ a dataset that contains 1159 pre-processed indicators from the World Bank Group GovData360 and TCdata360 platforms and an additional 8 COVID-19 variables generated based on reports from 138 countries. During the analysis, a rank-correlation-based complex method is used to identify the time- and space-varying relations between pandemic variables and the main topics of World Bank Group platforms. The results not only draw attention to the importance of factors such as air traffic, tourism, and corruption in report formation but also support further discipline-specific research by mapping and monitoring a wide range of such relationships. To this end, a source code written in R language is attached that allows for the customization of the analysis and provides up-to-date results

    Feature space reduction method for ultrahigh-dimensional, multiclass data: Random forest-based multiround screening (RFMS)

    Get PDF
    In recent years, several screening methods have been published for ultrahigh-dimensional data that contain hundreds of thousands of features, many of which are irrelevant or redundant. However, most of these methods cannot handle data with thousands of classes. Prediction models built to authenticate users based on multichannel biometric data result in this type of problem. In this study, we present a novel method known as random forest-based multiround screening (RFMS) that can be effectively applied under such circumstances. The proposed algorithm divides the feature space into small subsets and executes a series of partial model builds. These partial models are used to implement tournament-based sorting and the selection of features based on their importance. This algorithm successfully filters irrelevant features and also discovers binary and higher-order feature interactions. To benchmark RFMS, a synthetic biometric feature space generator known as BiometricBlender is employed. Based on the results, the RFMS is on par with industry-standard feature screening methods, while simultaneously possessing many advantages over them
    corecore